This assignment is for ETC5521 Assignment 1 by Team brolga comprising of Dhruv Nirmal and Gui Gao.
The classic Board Games have been around for decades, bringing people together to enjoy the traditional game. In Greece there are many popular board game associations and ‘fan clubs’ which organise many tournaments and offer a wealth of prizes.
Today, although Computer Games are in a golden age of development with technological support, there are still many great board games that are released each year and attract a lot of attention.
Board Game Geek is a specialist board game website. Users can find every board game and information about it. This information includes descriptions of the games, reviews, user ratings, professional ratings, prices, where to buy and more.
My teammates and I are both interested in board games and have tried many interesting board games. This study and analysis of the huge dataset of board games can help us understand board games from a different perspective, and also help us understand the whole landscape of board games and how it has changed over time.
So, we have tried to dig deeper into the data itself to show some reports and interesting data visualisations of the results.
The original data contains about 15-19 million reviews, and the data in the dataset should be filtered to affect the results of analyzing whether the mean scores of the games conform to a normal distribution.
There will be some irregularly recorded data inside the dataset, for example, there will be games with negative time years and 0 inside yearpulished varieble, which may bring limitations to our analysis.
Our data comes from Kaggle by way of Board Games Geek, with a hattip to David and Georgios. We could find the data via the following website: https://github.com/rfordatascience/tidytuesday/tree/master/data/2022/2022-01-25
We have two initial datasets, ratings and details. The initial size of the ratings dataset is around 4.8MB and contains 21631 observations and 10 columns. The original size of the details dataset is approximately 32.7MB, with 21831 observations and 23 columns.
| variable | class | description |
|---|---|---|
| id | double | Game ID |
| name | character | Game name |
| average | double | Average rating on Board Games Geek (1-10) |
| variable | class | dsecription |
|---|---|---|
| id | double | Game ID |
| yearpublished | double | Year game was published |
| minplayers | double | Game mechanic - how to play the game (separated by comma) |
| maxplayers | double | Minimum number of players required to play |
| playingtime | double | Maximum number of players required to play |
| boardgamemechanic | character | Average playing time of a game |
| owned | double | People who own a game |
| trading | double | People who trade a game |
| wanting | double | People who want to own a game |
| wishing | double | People who wish to own a game |
filter rows: We have about 30 non-repeating variables in our datasets, so we use the select function to filter out the variables we want to use before analyzing the questions
join: Our data comes from two datasets, but they have some same variables, so we use the join function to merge them together for analysis
separate rows: A board game may have multiple game mechanics, and I use separate_rows function that splits each game mechanic by comma and then calculates
remove NA rows: Among the variables we select, there are some NA values that affect our plotting results, therefore, I use na.omit function to delete NA rows
remove special characters: I split the game mechanics, but there arre special symbols inside the strings, so I use gsub function to delete all the special symbols
Q1. How board game ratings change by year of publication?
Q2. Trends in game mechanics over time.
Q3. Trends in board game publication rates over time.
Q4. What are the common game mechanics and their changes in prevalent?
Q5. What types of games are becoming increasingly popular?
Q6. How many games have each listed how many mechanics?
Q7. What is the distribution of ratings for the board games?
Q8. Is there a linear relationship between the year of release of a game and the average rating it receives?
Q9. Are Board Game Descriptions more positive or negative?
Q10. What are the 10 most common words used in board game descriptions?
Q11. Is there any relationship between average game time and ownership, min/max players?
Q12. How does number of people who wish to own the game and who own the game plot against each other?
Q13. How does Rank and average rating plot against each other. (Highly ranked games should have more rating)
Q14. Which age group prefers which type of game eg medical , building etc.? eg. younger children might be interested in building games etc and teenagers might be interested in games like Monopoly
Q15. Are people intrigued by a particular game designer?
Q16. How did the era of electronic games(starting from 2010-11) affect the number of people who want/wish/own board games?
Q17. Is it right to assume that a game board publisher publishes a single type of board game and do the game publishers only focus on a certain age group?
Q18. Are people intrigued by a particular game publisher?
Q19. How accurate has bayes average been?
Q20. Which era or decade played a big role in people playing more board games?
Q1. What are the common game mechanics and their changes in prevalent?
Q2. What is the distribution of ratings for the board games?
Q3. Is there any relationship between average playing time of a game and ownership(or products sold), min/max players required to play that game.
Q4. How did the era of electronic games(starting from 2010-11) affect the number of people who want/wish/own board games?
Q1. The most common game mechanics maybe the ‘Acting’, ‘Dice Rolling’ and ‘Hand Management’. I guess many game mechanics will become more popular.
Q2. I guess the ratings for the games follow a normal distribution.
Q3. More average play might mean less owners, as people might not want to invest too much time in a game, but it might be popular among bigger groups as board games keep a group engaged. But as the required number of players increase, the number of products sold should drop.
Q4. The demand of board games should be lower after the start of electronic games era.
| boardgamemechanic | n |
|---|---|
| Dice Rolling | 6112 |
| Hand Management | 4421 |
| Set Collection | 2936 |
| Variable Player Powers | 2719 |
| Hexagon Grid | 2371 |
| Simulation | 2099 |
| Card Drafting | 1869 |
| Tile Placement | 1805 |
| Modular Board | 1697 |
| Grid Movement | 1635 |
According to above table and column plot, we can find that the most common game mechanic is Dice Rolling, unsurprisingly. This is because Dice Rolling itself is a mechanic that can be used in many games. It has been around for a long time, and ancient peoples could make simple dice out of stones, clay, bones, etc. to play the game, so Dice Rolling is often seen as the most dominant symbol of board games (Sofiia & Joseph Alexander, 2017).
Before I analyzed this question, I made an inference based on the actual situation of family and friends in my life – the most common board game mechanics will become more and more popular. With lollipop plot above, we can see that the top 20 most common game mechanics have become more and more popular over the past few decades. This is consistent with my previous assumptions.
This is because modern board games are starting to include more mechanics, and the variety of games is becoming richer over time, so board games as a whole can also appeal to a wider audience. Another very important reason is that we now have a better standard of living and more free time, and the increase in leisure time is an obvious driver of demand for entertainment products such as board games.
| statistic | p.value | method | alternative |
|---|---|---|---|
| 0.02198876 | 1.6e-09 | Asymptotic one-sample Kolmogorov-Smirnov test | two-sided |
Figure 5.1: Relationship between average game time and games owned
##
## Call:
## lm(formula = owned ~ playingtime, data = Q3_dataset)
##
## Coefficients:
## (Intercept) playingtime
## 1490.36912 -0.02701
Figure 5.2: Relationship between average game time and games owned faceted for diiferent minimum players required
| minplayers | sum |
|---|---|
| 0 | 16197 |
| 1 | 6801485 |
| 2 | 20498836 |
| 3 | 3295642 |
| 4 | 631582 |
| 5 | 174119 |
| 6 | 26102 |
| 7 | 4258 |
| 8 | 29110 |
| 10 | 130 |
According to the plot in figure 5.1 one can clearly observe that if the average playing time of board games increases, the number of people who own that game decreases. My reason to assume the same was, games which require a lot of time to finish, might be a less popular option for people as a result of lack of time.
To verify my result, I fitted a linear model for the variables and found out the slope was negative.
A game with more playing time might sell less products but it can be the popular with people who play games in big groups. I expected, as the number of people required to play a game increases, the game’s selling numbers should drop. See Figure 5.2, one can observe as the required players to play a game increases, the number of products sold decreased drastically (See also 5.3 ) as smaller groups of people can more often indulge in board games.
There is now almost no market for new casual board games, and even the classics sell only a fraction of their annual sales just a few years ago. the advent of apps had a much more profound effect, absolutely devastating sales of casual board games but paradoxically increasing interest in German-style games.
Many popular mobile games like Temple Run, Subway Surfer e.t.c were released around the year 2012-13 which led to the assumption that as games were more easily accessible more then than ever, the digital games era must have hurt the turnover of board game publishers.
As one can observe the vertical line is at year 2013, after which there was a drastic drop in number of people who bought new board games or wish or want them. The number people owning a game peaked around year 2014 and then decreased by almost 5 times.
The top 20 most common game mechanics have become more and more popular over the past few decades. The most common game mechanic being Dice Rolling. This was achieved by doing some text analysis of the selected columns, taking the help of functions like seperate_row, gsub. Plotting histogram and performing KS-test statistic on ratings of board games, helped us came to a conclusion that the board games rating from this data set does not follow normal distribution. The average playing time of a board game and minimum number of players required to play a game is inversely proportional to the number of games sold. We took a look at the linear model coefficients to confirm our assumption and observation.
Data pivoting enabled us to rearrange the columns and rows in a report so we can view data from different perspectives. The era of mobile games affected people’s interest in board games negatively as after year 2012-13 the number of games owned/want/wish/trade dropped drastically.
C. Sievert. Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida, 2020.
H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
Lüdecke et al., (2021). performance: An R Package for Assessment, Comparison and Testing of Statistical Models. Journal of Open Source Software, 6(60), 3139. https://doi.org/10.21105/joss.03139
Robinson D (2022). drlib: Personal R package of David Robinson. R package version 0.1.1.
Robinson D, Hayes A, Couch S (2022). broom: Convert Statistical Objects into Tidy Tibbles. R package version 1.0.0, https://CRAN.R-project.org/package=broom.
Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686 https://doi.org/10.21105/joss.01686.
Wickham H, Girlich M (2022). tidyr: Tidy Messy Data. R package version 1.2.0, https://CRAN.R-project.org/package=tidyr.
Wickham H (2022). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.4.1, https://CRAN.R-project.org/package=stringr.
Yermolaieva S, Brown JA (2017). Dice design deserves discourse. Game & Puzzle Design, 3(2), 64-70.
Yihui Xie (2022). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.39.
Zhu H (2021). kableExtra: Construct Complex Table with ‘kable’ and Pipe Syntax. R package version 1.3.4, https://CRAN.R-project.org/package=kableExtra.